Adversarial examples for extreme multilabel text classification
نویسندگان
چکیده
Abstract Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space extremely large, (ii) each data point may have multiple positive labels, and (iii) follows strongly imbalanced distribution. With applications recommendation systems automatic tagging of web-scale documents, research on XMTC has been focused improving prediction accuracy dealing with data. However, robustness deep learning based models against adversarial examples largely underexplored. In this paper, we investigate behaviour under attacks. To end, first, define attacks multilabel problems. We categorize attacking classifiers as (a) positive-to-negative, where target label should fall out top-k predicted (b) negative-to-positive, negative be among labels. Then, by experiments APLC-XLNet AttentionXML, show that are highly vulnerable to positive-to-negative but more robust negative-to-positive ones. Furthermore, our success rate an More precisely, tail classes for which attacker can generate samples high similarity actual data-points. overcome problem, explore effect rebalanced loss functions not only do they increase classes, also improve these The code available at https://github.com/xmc-aalto/adv-xmtc .
منابع مشابه
Multilabel Text Classification for Automated Tag Suggestion
The increased popularity of tagging during the last few years can be mainly attributed to its embracing by most of the recently thriving user-centric content publishing and management Web 2.0 applications. However, tagging systems have some limitations that have led researchers to develop methods that assist users in the tagging process, by automatically suggesting an appropriate set of tags. W...
متن کاملAdversarial Extreme Multi-label Classification
The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. Datasets in extreme classification exhibit a long tail of labels which have small number of positive training instances. In this work, we pose the learning task in extreme classification with large number of tail-...
متن کاملFlexible Text Segmentation with Structured Multilabel Classification
Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguo...
متن کاملDatabase-Text Alignment via Structured Multilabel Classification
This paper addresses the task of aligning a database with a corresponding text. The goal is to link individual database entries with sentences that verbalize the same information. By providing explicit semantics-to-text links, these alignments can aid the training of natural language generation and information extraction systems. Beyond these pragmatic benefits, the alignment problem is appeali...
متن کاملMultilabel Associative Text Classification Using Summarization
This paper deals with the concern of curse of dimensionality in the Text Classification problem using Text Summarization. Classification and association rule mining can produce well-organized as well as precise classifiers than established techniques [1]. However, associative classification technique still suffers from the vast set of mined rules. Thus, this work brings in advantages of Automat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2022
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-022-06263-z